BUG: Fix extra decimal places in DataFrame.to_csv() with quoting=csv.QUOTE_NONNUMERIC and float16/float32 dtypes (#60699) #60804
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
doc/source/whatsnew/v3.0.0.rstfile if fixing a bug or adding a new feature.quoting=Nonelogic forfloatarrays.Issue
Dataframe.to_csv()generates extra decimal places in output whenquoting=csv.QUOTE_NONNUMERIC, dataframe'sdtype=float16 / float32andfloat_format=None.Reason
Dataframe.to_csv()internally usesget_values_for_csv()and whenquotingis specified (=csv.QUOTE_NONNUMERIC), it converts numpyfloatarray toobject.pandas/pandas/core/indexes/base.py
Lines 7751 to 7765 in 57d2489
np.array(values, dtype="object")affectsfloat16,float32andfloat64differentlyfloat16,float32objectarray, internal binary representation of the float16 values is stored inside Python's float (equivalent tonumpy.float64), which can fully display that exact binary representationdtype=float16anddtype=float32when conversion todtype=objectfloat64float64represent most decimal numbers (like 8.57) exactly or with an extremely small error that is practically undetectable when converted to a higher precision or displayed as a Pythonfloatfloat64numpy array toobject, internal binary representation is directly transferred to the object type and there is no "extra decimals" in the output.Fix Implemented
To preserve the decimal representation in case of
dtype=float16andfloat32, we convert numpy float array to strings and then convert them back to Python'sfloatwhich is nearly equivalent tonumpy.float64strpreserves decimal representation and prevents exposing the internal binary representation.floatis necessary to avoid treating float values as string and storing them in 64-bit (double precision) preserves the string representation.Additionally, in the original code
When
quotingisNone, converting first tostrand then back toobjectis unnecessary work because the replacement ofna_repcan be done directly on an object array (na_rep : str).Therefore,
quoting=Nonebranch was removed to streamline the logic.Testing
Successfully pass all existing test cases in
test_to_csv.pywith tests added for dataframes withdtypeasfloat16,float32andfloat64with mix of negative, positive and missing values andquoting=csv.QUOTE_NONNUMERIC